Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add key terms to use case intros/tutorial and what is dvc? docs [SEO] #1806

Merged
merged 16 commits into from
Oct 8, 2020

Conversation

jeremydesroches
Copy link
Contributor

@jeremydesroches jeremydesroches commented Sep 24, 2020

Based on existing search results, I expanded the use case intro docs with data and model versioning references. The tutorial received similar changes — pending review with @jorgeorpinel.

Merged PR #1805 (meant to separate commits but I messed it up 😒).

/docs/user-guide/what-is-dvc.md
Add explicit "machine learning" references including "version machine learning experiments"

/docs/use-cases/index.md
Add "data science use cases", "tools" and "best practices"

/docs/use-cases/versioning-data-and-model-files/index.md
Add "data and model versioning", "versioning (large) data files", and "model versions"

/docs/use-cases/versioning-data-and-model-files/tutorial.md
Add "ML model versions", "dataset versioning", "model and large dataset versioning", "machine learning models", "dataset and ML model versioning"

UPDATE: See results in #1806 (comment)

@jeremydesroches jeremydesroches changed the title docs: add key terms to use cases intros and tutorial [SEO] docs: add key terms to use case intros/tutorial and what is dvc? docs [SEO] Sep 24, 2020
@jeremydesroches jeremydesroches added A: docs Area: user documentation (gatsby-theme-iterative) A: website Area: website labels Sep 24, 2020
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 25, 2020

Reviewed Whas is DVC? for now ☝️

BTW, you should have access to push branches to this repo @jeremydesroches so no need to use a fork going forward 🙂. Using branches directly on origin will trigger a review app that we can all see online, so it's preferred. Thanks

@jeremydesroches
Copy link
Contributor Author

BTW, you should have access to push branches to this repo @jeremydesroches so no need to use a fork going forward 🙂. Using branches directly on origin will trigger a review app that we can all see online, so it's preferred. Thanks

Awesome. I will do that! Thanks @jorgeorpinel

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of use cases index. Checking PR scope. Please generalize this feedback to the other docs before I get to review them.

that DVC can help with or improve. Our use cases are not written to be run
end-to-end like tutorials. For more general, hands-on experience with DVC,
please see our [Get Started](/doc/tutorials/get-started) instead.
We provide short articles on common ML workflow and data science use cases that

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there is an SEO motivation here: the search term is "data science use cases".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! Going fwd if you can make some notes in the PR file changes on terms each change is for, or a list of terms in the PR description at least, that would be helpful for reviews 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Definitely. That makes a lot of sense and I'll do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that probably users looking for "data science use cases" are not looking for DVC use cases? I don't want to assume what 1000s of people want, but it sounds like a basic data science question rather than anything to do with structuring DS projects (e.g. using DVC).

So maybe changes like this will bring more traffic but also up the bounce rate. We'll have to try and see, I guess!

Copy link
Contributor Author

@jeremydesroches jeremydesroches Oct 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that probably users looking for "data science use cases" are not looking for DVC use cases?

It's true that the term is not a perfect match, but it is related to the primary subject area (data science). Most non-brand terms are going to be partially related but inexact, as searches for discovery are imprecise (because they don't know what DVC is yet).

The search engine is trying to fill in the gaps, so we want to expand on terms that are showing interest within the correct subject area in order to meet them halfway. This article already has some impressions for "use cases", including ML and data science so that's the motivation for this change.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, cool! Keeping unresolved for future reference.

content/docs/use-cases/index.md Outdated Show resolved Hide resolved
content/docs/use-cases/index.md Outdated Show resolved Hide resolved
content/docs/use-cases/index.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished reviewing all the changes:

Comment on lines 240 to 241
That's it! We've tracked the second version of the dataset, model, and metrics
in DVC and committed the DVC-files that point to them with Git. Now let's look
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • a -> the: Actually 'a' is slightly more correct here.
  • Thanks for the "them committed" fix 👍
  • Let's now -> Now let's: What's the difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! "a second version" is better. Made the change.

Reverted to "Let's now" — I tried some other versions of the second sentence when I rewrote the first one, and unintentionally switched them. (IMO "Let's now" is a better usage for following a result with a new instruction. 😉 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel
Copy link
Contributor

@shcheklein would you like to review this first SEO PR too?

@shcheklein
Copy link
Member

@jeremydesroches @jorgeorpinel a lot of great improvements, guys! thanks!

@shcheklein shcheklein merged commit 5044aa9 into iterative:master Oct 8, 2020
@jeremydesroches
Copy link
Contributor Author

BTW, you should have access to push branches to this repo @jeremydesroches so no need to use a fork going forward 🙂. Using branches directly on origin will trigger a review app that we can all see online, so it's preferred. Thanks

Hey @jorgeorpinel, I tried this earlier and got a permission error. Can you check if I have access? Thanks

@jorgeorpinel
Copy link
Contributor

Sure, we'll check.

jorgeorpinel added a commit that referenced this pull request Oct 8, 2020
> Add "data and model versioning", "versioning (large) data files", and "model versions"
@jeremydesroches
Copy link
Contributor Author

These four pages were submitted for reindexing and added to a list of docs to track for ongoing SEO performance.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Oct 19, 2020

OK, checking back on this, let's see if we can find anything on the Search Console now (over 10 days after merge).

Starting with https://dvc.org/doc/user-guide/what-is-dvc,

  • firstly I discovered that there's still some old URLs that used to be under that one which are returning 404s... Should we setup a redirect even when it's pretty late?
  • Looking at the specific page only, there's no obvious change in the general terms (also very hard to tell with the weekly periodicity — any way to smooth it out on a 7-day window here or on some other tool @jeremydesroches ? GA has that option for ex.)
  • Looking at the "machine learning" terms (for that page) there's not much impact either, but at least now it comes up in searches now and then 😋
    image

@jeremydesroches could you please look at the other 3 pages and report whether there is any impact so far/ share links to check back later on? Any other comments/tips on this evaluation process are appreciated. Thanks!

@jeremydesroches jeremydesroches deleted the seo-wk2-use-cases branch November 3, 2020 18:42
@jeremydesroches
Copy link
Contributor Author

firstly I discovered that there's still some old URLs that used to be under that one which are returning 404s... Should we setup a redirect even when it's pretty late?

Hi @jorgeorpinel. No, a redirect won't do anything at this point because the old URLs aren't in the index anymore. You can check this for any given URL by clicking the magnifying glass (or searching for it in the bar at top on SC).

could you please look at the other 3 pages and report whether there is any impact so far/ share links to check back later on? Any other comments/tips on this evaluation process are appreciated.

My assessment for each of the four pages is included below. The process I use is as follows:

  1. set reasonable comparison period (in this case time since merge vs. previous period of same length)
  2. review comparison curves/amounts for differences in clicks, impressions, CTR, average position
  3. where there is a difference, switch to queries table and sort by most clicks or impressions
  4. in descending order, look for notable click/impression/CTR/position gains or losses by term
  5. when looking at a single metric, sort by difference in descending order to see terms with largest change
  6. note patterns of change for target terms and new patterns/terms that appear (if any)
  7. look at the 6 month or longer curve to see if any change in long-term trends

Sorry, no way to smooth things out to weekly in SC. For a comparison period this short it helps (me, anyway) to see the peaks/valleys. The curves are nice but in the end the aggregate and term-level gain/loss are the key metrics for any given period.

What is DVC?Merge - Nov 2 vs. previous period
Incrementally more clicks and impressions. Improved and more consistent average position. Additional clicks/impressions and improved position on "mlflow"-related terms. Looking at a longer period position is slightly higher and more consistent after the update. As you pointed out, the page started appear for new terms as well (even more now if you check your link again). Overall, a good example of how even small updates can spur changes to the index and across many pages add up to significant results.

Use Cases IndexMerge - Nov 2 vs. previous period
Big jump in impressions (3.5x). Good example of page being "auditioned" by Google to see where it should fall in index. The bump is likely to be temporary if there aren't more clicks or further changes. Most of the new impressions were around "use cases", notably "data science use cases" (the target term for the changes) and "ml use cases". The increased impressions show there is interest and it's likely we could expand further to potentially improve clicks and position.

Versioning Data and Model Files IndexMerge - Nov 2 vs. previous period
Not a lot of overall movement here, but an improvement in average position is always good. There were 9 added clicks (10% increase) from the "model versioning" target term, so that worked. Expect we will have a lot more to talk about here when your latest changes are merged. Updates and improvements are good for search results!

Versioning Data and Model Files TutorialMerge - Nov 2 vs. previous period
Slight growth in clicks and impressions. A few additional clicks from target terms "model versioning" and "dataset versioning", but nothing groundbreaking (yet!). Growth in impressions for target terms "data versioning" and "dataset versioning" too. Most significant here is an upward trend in average position over the period but also against the long term trend. This is already a fairly popular page that produces a decent number of clicks, so only slight movement is expected from minor changes. Further updates could continue the uptrend — recommend a combination of content updates and term expansion.

@jorgeorpinel
Copy link
Contributor

OK I'll take a look at your specific reports ASAP. for now I just wanted to follow up on my own mini report as actually now there is more of an impact both in what-is-dvc which seems to have an improved avg position consistently:

image

But especially in the "machine leaning" term, which is starting to come alive:

image

So good work with those even if the change is small so far. At least we know there is a measurable impact!

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Nov 25, 2020

Hi. Doing a final check on this, the first PR. Here are the results I can see:

My 2 previous mini-reports continued the same: improvements in the secondary? metrics (impressions and position).

For the other docs I just checked the weeks from the time they were merged vs. the same number of weeks (6) before that:

What is DVC? - link - more impressions and better position, yet no more clicks. Maybe people just aren't that interested on this topic.

image

Use Cases Index - link - This one actually has ~10% more impressions AND ~15% more clicks, so the CTR improved 👍 . It suggests to me that this is one of the sections with the most potential for growth.

Edit: Fixed link and added image for exact URL (and not *URLs including individual use cases). Strangely there's no click-level data on terms, but looks like an increase impressions for some that we targeted (added screenshot). -JD

image

image

Versioning Data and Model Files use case - link - similar as the previous one (even better clicks and CTR improvement)
👍 but the position actually went down a little 🤷 — hopefully it will improve even more once the new versioning use case is released (almost...)

Edit: Fixed link to show exact use case URL (and not *URLs including tutorial). Position actually improved too. 🙂 -JD

image

Versioning Tutorial - link - also good improvements (around 25% in clicks, for example)

image

So in general these are good results cc @shcheklein

@jeremydesroches please start posting similar reports in your other PRs next week. Thanks

@jeremydesroches
Copy link
Contributor Author

@jeremydesroches please start posting similar reports in your other PRs next week.

Thanks for checking on this @jorgeorpinel. Yes, I’ll do the same for other PRs this week along with the other GSoD reports.

shcheklein pushed a commit that referenced this pull request Dec 6, 2020
* cases: [WIP] befin rewriting Versioning:
explain why versioning large files is important/a thing
per #1716 (comment)

* cases: give some sense of why versioning data and models is important
per #1747 (comment)

* guide: why DVC is the way to Version data (sell philosophy)
per #1747 (comment)

* cases: add example section explaining why data versionig is
is important, and how it looks with DVC
per
#1747 (comment)

* cases: wrap up Versioning full draft

* cases: rename demo section in Versioning, roll back checkout img, et al.

* cases: some more versioning updates

* cases: shorten versioning intro

* cases: add bullet list of Versioning advantages
per #1747 (comment)

* cases: shorten Why DVC section in Versioning

* term: data modeling -> data engineering
per #1747 (review)

* cases: make advantages section in Data Registry (consistency)

* cases: make separate Versioned storage section

* cases: rewrite intro and other changes to Versioning
per #1747 (comment)

* cases: cover gap between Versioning and (remote) storage, link to GS
also per #1747 (comment)

* use-cases: reapply SEO keyword changes from #1806

> Add "data and model versioning", "versioning (large) data files", and "model versions"

* cases: make p about storage less overlapping to previous one
per #1747 (review)

* cases: add paragraph about versioning advantages before DVC's motivation
per #1747 (review)

* cases: simplify lists of advantages in Versioning (and Data Reg)
rel #1747 (review)

* cases: limitation->constraint (to avoid a redundancy)

* guide: move DVC is not Git! from use cases to What is DVC?
rel #1747 (review)

* cases: ~~Summary of~~ Advantages (H2)

* cases: rewrite parts of the DVC motivation paragraphs in Versioning

* cases: improve vrsng intro and dedupe bullet lists

* cases: rename Advantages sectino of vrsng
per #1747 (review)

* cases: expand on How it looks (vrsng) with focus on workspace
per #1747 (review)

* guide: improve DVC is not Git! section
per #1747 (review)

* cases: rename Versioning use case (why "Files"?)
per #1747 (review)

* cases: rewrite (again) the intro to vrsng
per #1747 (comment)

* cases: improve versioning intro (more coherent)

* cmd: quick term update

* cases: update links to Versioning use case

* cases: refine Versioning intro, add proposed figure

* cases: summarize, simplify, focus on the essence, et al.
and propose new "Versioned storage" use case

* cases: add redirect for new Versioning use case location

* cases: merge How it looks + Version control sections

* cases: simplify versioning-data-and-models#how-it-looks

* Revert "redirect for new Versioning use case URL" 12bc7ed and
put back the files and nav

* cases: rewrite intro to improve motivation and
post a draft figure proposal

* cases: update Why DVC and benefits list
based on https://docs.google.com/document/d/1jmvbsRC2JhzqAF0eTGu0tX9ydMNndiBviCHq5ezzfEY/edit

* cases: actually revert URL change from recent commit

* cases: more updates to the benefits bullets in Versioning

* cases: rewrite How it looks (& feels) section

* cases: remove non-essential info. from How it looks section of Versioning (a little aggressive)

* cases: simplify How it looks per David and some of Ivan's feedback
(remove cache mentions)

* cases: remove H2s temporarily, simplify benefits bullet list, et al.

* cses: rewrite benefit bullets and simplify how it feels section

* cases: make bullet list into paragraph temp.

* cases: wrap up Vrsng? (text)

* cases: hardcode colums in How it feels section of Vrsng

* cache: simplify it's structure explanation and add CAS term (from Vrsng use case)

* guide: revert changes to this section for now

* cases: polish latest iteration of Versioning use case

* cases: next iteration of Versioning page
per private feedback. Some issues may still be outstanding, will send smalles commits next

* cases: polishing my last iteration of the Vsng page

* remove a bunch of info from Vrsng to simplify again

* cases: minor iteration of Vrsng, pending benefits list

* guide: updates to What is DVC
per #1747 (review)
and #1747 (review)

* cmd: roll-back unrelated changes (stashed elsewhere for now)
per #1747 (review)
and #1747 (review)

* cases: work on benefits of Vrsng

* cases: more work on benefits of Vrsng

* cases: remove emojis; improve benefits list; add refs to other cases

* cses: clarify about cache and about metafiles in Versioning

* cases: simplify p about roll back/fwds; split benefit about data regs

* cases: change BEFORE to be similar to the top fig.

* cases: another iteration of Versioning

* cases: simplify Versioning again

* cases: improvements on Vrsng per direct feedback

* cases: more updates to latest text and figures

* cases: rephrase Vrsng benefits list

* cases: revert to previous draft fig

* cases: update 2nd figure draft, and reorder codification p

* cases: rework Vrsng benefits and
other small improvements and
removed advanced topics (for a new section coming up)

* cases: draft What's Next section added with advanced scenarios for Vrsng

* cases: simplify 2nd figure

* cases: make first Vrsg figure shorter

* cases: merge advanced scenarios with benefits list

* cases: roll back changes to Data Regs
per https://github.com/iterative/dvc.org/pull/1747/files#r515533725

* cases: improvements per Dmitry's feedback...
see #1747 (review)

* cases: train_feats > features in figures for Vrsng

* cases: rename Vrng Tutorial label in nav (use emoji)

* cases: explain simple file naming a bit more
per #1747 (comment)

* cases: Vrng copy edits

* cases: add efficient data mgmt benefit
per #1747 (comment)

* cases: reorder Vrsg benefits list
per #1747 (comment)

* cases: rewrite file naming and data mgmt benefits of Vrsg

* cases: expand story to cover storage and data management
and update benefits

* cases: generalized Vrsg benefits

* cases: separate data mgmt from versioning (through codification) in Vrsg

* Make note about other guides, refs, and tutorial (Vrng)

* cases: emphasize Simplicity benefit of Vrng is the opposite of "complicated"

* cases: another rewrite of text and benefits

* cases: copy edits to latest Vrng iteration, and append next steps paragaph (bottom)

* cases: another iteration of Versioning use case

* cases: clarify data mgmt is for data in Vrng benefits
@jeremydesroches
Copy link
Contributor Author

OK @jorgeorpinel, I've updated the other GSoD PRs with images now too. Please check my notes above as I added some images and updates to your original findings for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) A: website Area: website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants